R package workshop for BEST-COST
2025-03-24
Part 1 (today)
Get to know healthiar
No need to code!
Between today and next week
Install healthiar (& RStudio) - We’re here to help!
Do the 2 exercises
Part 2 (next week on 2 April) - in R Studio
Recap workshop part 1 & exercises walk through
Mock case studies - Your turn!
Conclusion
1st hour
About healthiar
The healthiar package in RStudio
Examples
Example: attribute_health() using RR
Example: attribute_health() with RR & uncertainty
Example: attribute_health() with AR
Q&A I - Please save your questions (and note the slide number)!
Break
2nd hour
Examples
Example: iteration with attribute_health()
compare()
monetize()
socialize()
Additional features
Post-healthiar workflow
Exercises
Q&A II - Please save your questions!
healthiarhealthiar 1/5healthiar is an R package (= collection of R functions)
The healthiar functions allow you to quantify and monetize the health impacts of environmental stressors (air pollution & noise)
Let’s jump right in, with an example of a healthiar R function call
healthiar 2/5Figure: healthiar overview. DALY = disability-adjusted life years; GBD = global burden of disease; MDI = multidimensional deprivation index; PAF = population attributable fraction; PIF = population impact fraction; YLD = years lived with disability; YLL = years of life lost
healthiar 3/5healthiar core family members (functions)
attribute_health() to env. exposure with either relative or absolute riskattribute_lifetable() life table analysis (RR & AR)summary_uncertainty() Monte Carlo simulationattribute_mod() modify an existing assessmentcompare() two scenariosmonetize() health impactscba() cost-benefit analysissocialize() to discover inequalities in health impactsget_mdi() creates the BEST-COST MDI (Multidimensional Deprivation Index)get_daly() by adding up YLL & YLDhealthiar 4/5BEST-COST GitHub repository
Let’s check out the BEST-COST GitHub repo and the README file
healthiar 5/5Installation & getting started
The README file contains the following information
healthiar R packagehealthiar package in RStudiohealthiar in RStudio 1/3RStudio startup screen
healthiar in RStudio 2/3Post installation, you can access the healthiar package landing page in RStudio by going to the Packages tab and then clicking on the healthiar package.
healthiar in RStudio 3/3Landing page of the healthiar package in RStudio, where you find the package vignettes and function documentation.
= long-form guide to a package
healthiar step-by-step and contains (reproducible) examples. You can open the intro_to_healthiar vignette within RStudio or as a HTML within your browser.Note
The vignette is a work in progress: we appreciate any feedback or suggestions you might have to make it more useful to future users!
You can access the intro vignette by
going to the Packages tab in RStudio, scrolling to the healthiar package and clicking on healthiar > User guides, package vignettes and other documentation > HTML
running *browseVignettes("healthiar")* and clicking on HTML on the page that pops up
Tip
Click on a function name in the package landing page to access its documentation or run ?attribute_health
Any function documentation contains the following sections:
Title Essence of the function
Description What does the function do exactly?
Usage Bare-minimum examples of how to use the function, including default argument value(s)
Arguments Short description of each function argument, e.g. input type (numeric vs. string), options (if available), how each argument affects the output, …
Details (optional) Additional details about the function
Value Information about the function output
Examples (optional) Shows how the function works
Any arguments without a = symbol after the name have no default and must be user-specified
For a more detailed function documentation walk through see the Appendix
attribute_health() with RRattribute_health() with RRGoal: attribute COPD cases to air pollution
Tip
healthiar comes with some example data that start with exdat_ that allow you to test functions.
results_pm_copd <- attribute_health(
erf_shape = "log_linear",
rr_central = exdat_pm_copd$relative_risk,
rr_lower = exdat_pm_copd$relative_risk_lower,
rr_upper = exdat_pm_copd$relative_risk_upper,
rr_increment = 10,
exp_central = exdat_pm_copd$mean_concentration,
cutoff_central = exdat_pm_copd$cut_off_value,
# bhd_central = exdat_pm_copd$incidents_per_100_000_per_year/1E5*exdat_pm_copd$population_at_risk,
bhd_central = exdat_pm_copd$incidence # Uncomment once change committed to main
) Every attribute output initially consists of two main lists (“folders”), and additional sub-lists (“sub-folders”)
health_main contains the main results
health_detailed contains more detailed results and additional information
impact_raw contains detailed results
input_table contains the input data as entered in the function call
args is a list of the function arguments as used by R in the background
The output tables are in the tibble format, which is a modern version of the original data frame, and can be used like a data frame
Tip
Different ways exist and you might have a personal preference! However, you might encounter all options.
Go to the Environment tab in RStudio …
… and click on a variable to “open” it.
Alternatively, you can use
View(results_pm_copd), which has the same effect.
results_pm_copd$health_main$impact_rounded
Note: after typing the $ sign you can see all available options by pressing the tab key and use the arrows & tab keys to select an option (or alternatively use the mouse)
results_pm_copd[["health_main"]]
Note: if the cursor is located within the square braces you can see all available options by pressing the tab key
Using the purrr::pluck function to select a list and then the dplyr::pull function extract values from a specified column
results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")
Note: available options can’t be displayed automatically using these functions -> better suited for a more permanent analysis script
| impact_rounded | impact | pop_fraction | erf_ci | rr | exp | bhd |
|---|---|---|---|---|---|---|
| 3502 | 3501.962 | 0.1138961 | central | 1.369 | 8.85 | 30747 |
| 1353 | 1353.066 | 0.0440064 | lower | 1.124 | 8.85 | 30747 |
| 5474 | 5473.888 | 0.1780300 | upper | 1.664 | 8.85 | 30747 |
Tip
Each row shows a result obtained with all the input data & calculation pathway specifications shown in that row
Some of the most relevant columns include:
rr_central, ..._lower or ..._upper was used to obtain impactattribute_health() with RR & uncertaintyattribute_health() with RR & uncertaintyGoal: attribute lung cancer deaths to PM2.5 exposure
Tip
See the intro vignette for a detailed description of output columns.
The health_detailed output table contains all different combinations of the arguments with uncertainty. E.g. rr_central with exp_lower and bhd_upper, …
| exp_ci | bhd_ci | erf_ci | pop_fraction | impact | geo_id_disaggregated | is_lifetable | prop_pop_exp | rr_increment | erf_shape | exposure_name | approach_risk | exposure_dimension | exposure_type | exp | rr | bhd | cutoff_ci | cutoff | pop_fraction_type | rr_conc | impact_rounded |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| central | central | central | 0.1138961 | 3501.9619 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.369 | 30747 | central | 5 | paf | 1.128536 | 3502 |
| central | lower | central | 0.1138961 | 3189.0894 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.369 | 28000 | central | 5 | paf | 1.128536 | 3189 |
| central | upper | central | 0.1138961 | 3644.6736 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.369 | 32000 | central | 5 | paf | 1.128536 | 3645 |
| central | central | lower | 0.0440064 | 1353.0658 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.124 | 30747 | central | 5 | paf | 1.046032 | 1353 |
| central | lower | lower | 0.0440064 | 1232.1801 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.124 | 28000 | central | 5 | paf | 1.046032 | 1232 |
| central | upper | lower | 0.0440064 | 1408.2058 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.124 | 32000 | central | 5 | paf | 1.046032 | 1408 |
| central | central | upper | 0.1780300 | 5473.8882 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.664 | 30747 | central | 5 | paf | 1.216589 | 5474 |
| central | lower | upper | 0.1780300 | 4984.8398 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.664 | 28000 | central | 5 | paf | 1.216589 | 4985 |
| central | upper | upper | 0.1780300 | 5696.9598 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.85 | 1.664 | 32000 | central | 5 | paf | 1.216589 | 5697 |
| lower | central | central | 0.0899213 | 2764.8092 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.369 | 30747 | central | 5 | paf | 1.098806 | 2765 |
| lower | lower | central | 0.0899213 | 2517.7955 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.369 | 28000 | central | 5 | paf | 1.098806 | 2518 |
| lower | upper | central | 0.0899213 | 2877.4806 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.369 | 32000 | central | 5 | paf | 1.098806 | 2877 |
| lower | central | lower | 0.0344604 | 1059.5528 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.124 | 30747 | central | 5 | paf | 1.035690 | 1060 |
| lower | lower | lower | 0.0344604 | 964.8902 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.124 | 28000 | central | 5 | paf | 1.035690 | 965 |
| lower | upper | lower | 0.0344604 | 1102.7316 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.124 | 32000 | central | 5 | paf | 1.035690 | 1103 |
| lower | central | upper | 0.1416706 | 4355.9450 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.664 | 30747 | central | 5 | paf | 1.165054 | 4356 |
| lower | lower | upper | 0.1416706 | 3966.7760 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.664 | 28000 | central | 5 | paf | 1.165054 | 3967 |
| lower | upper | upper | 0.1416706 | 4533.4583 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 8.00 | 1.664 | 32000 | central | 5 | paf | 1.165054 | 4533 |
| upper | central | central | 0.1453304 | 4468.4726 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.369 | 30747 | central | 5 | paf | 1.170043 | 4468 |
| upper | lower | central | 0.1453304 | 4069.2501 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.369 | 28000 | central | 5 | paf | 1.170043 | 4069 |
| upper | upper | central | 0.1453304 | 4650.5716 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.369 | 32000 | central | 5 | paf | 1.170043 | 4651 |
| upper | central | lower | 0.0567717 | 1745.5580 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.124 | 30747 | central | 5 | paf | 1.060189 | 1746 |
| upper | lower | lower | 0.0567717 | 1589.6063 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.124 | 28000 | central | 5 | paf | 1.060189 | 1590 |
| upper | upper | lower | 0.0567717 | 1816.6929 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.124 | 32000 | central | 5 | paf | 1.060189 | 1817 |
| upper | central | upper | 0.2247829 | 6911.4001 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.664 | 30747 | central | 5 | paf | 1.289961 | 6911 |
| upper | lower | upper | 0.2247829 | 6293.9214 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.664 | 28000 | central | 5 | paf | 1.289961 | 6294 |
| upper | upper | upper | 0.2247829 | 7193.0531 | 1 | FALSE | 1 | 10 | log_linear | NA | relative_risk | 1 | population_weighted_mean | 10.00 | 1.664 | 32000 | central | 5 | paf | 1.289961 | 7193 |
attribute_health() with ARattribute_health() with ARGoal: attribute cases of high annoyance (HA) to noise exposure
| exposure_dimension | exp | pop_exp | impact |
|---|---|---|---|
| 1 | 57.5 | 387500 | 49674.594 |
| 2 | 62.5 | 286000 | 50788.595 |
| 3 | 67.5 | 191800 | 46813.105 |
| 4 | 72.5 | 72200 | 23657.232 |
| 5 | 77.5 | 7700 | 3298.314 |
attribute_health()attribute_health()Goal: attribute disease cases to PM2.5 exposure in multiple geographic units, such as municipalities, provinces, countries, …
results_iteration <- attribute_health(
geo_id_disaggregated = c("Zurich", "Basel", "Geneva", "Ticino", "Valais"),
geo_id_aggregated = c("Ger","Ger","Fra","Ita","Fra"),
rr_central = 1.369,
rr_increment = 10,
cutoff_central = 5,
erf_shape = "log_linear",
exp_central = list(11, 11, 10, 8, 7),
bhd_central = list(4000, 2500, 3000, 1500, 500)
)Here the we want to aggregate results by language region ("Ger", "Fra", "Ita")
results_iteration <- attribute_health(
geo_id_disaggregated = c("Zurich", "Basel", "Geneva", "Ticino", "Valais"),
geo_id_aggregated = c("Ger","Ger","Fra","Ita","Fra"),
rr_central = 1.369,
rr_increment = 10,
cutoff_central = 5,
erf_shape = "log_linear",
exp_central = as.list(c(11, 11, 10, 8, 7)),
bhd_central = as.list(c(4000, 2500, 3000, 1500, 500))
)Tip
For iterations, enter geo unit-specific inputs as a list
Feed unique geo ID’s as a vector to the geo_id_disaggregated argument (e.g. municipality names)
Optional: aggregate geo unit-specific results by providing higher-level ID’s (e.g. region names) to the geo_id_aggregated argument (as a vector)
Tip
The main output contains aggregated results if available, or disaggregated results if no aggregation ID was provided
| geo_id_aggregated | impact_rounded | erf_ci | exp_ci | bhd_ci |
|---|---|---|---|---|
| Fra | 466 | central | central | central |
| Ger | 1116 | central | central | central |
| Ita | 135 | central | central | central |
| geo_id_disaggregated | geo_id_aggregated | impact_rounded |
|---|---|---|
| Zurich | Ger | 687 |
| Basel | Ger | 429 |
| Geneva | Fra | 436 |
| Ticino | Ita | 135 |
| Valais | Fra | 30 |
compare()compare()Goal: comparison of two scenarios
attribute_health() to calculate burden of scenarios A & Bcompare() to compare scenarios A & BThe compare() results are very similar to attribute_health() results:
health_main contains main comparison results
health_detailed
impact_raw raw comparison results
scenario_1 contains results of scenario 1 (scenario A in our case)
scenario_2 contains results of scenario 2 (scenario B in our case)
| impact | impact_rounded | impact_1 | impact_2 | bhd | exp_1 | exp_2 | rr_conc_1 | rr_conc_2 |
|---|---|---|---|---|---|---|---|---|
| 773.5564 | 774 | 1050.86 | 277.304 | 25000 | 8.85 | 6 | 1.043879 | 1.011217 |
monetize()monetize()Different monetization pathways / options are available
monetize() outputmonetize() adds two main lists (“folders”) to the inputted health impacts
monetization_main contains total results
monetization_detailed contains detailed results
by_year yearly results| impact | monetized_impact | discount_rate | valuation | monetized_impact_before_inflation_and_discount | monetized_impact_after_inflation_and_discount |
|---|---|---|---|---|---|
| 3501.962 | 60416.46 | 0.03 | 20 | 70039.24 | 60416.46 |
socialize()socialize() resultssocialize() adds two main lists (“folders”) to the inputted health impacts
social_main contains total results
social_detailed
results_detailed
overview_quantiles
| difference_value | difference_type | comment | overall | last |
|---|---|---|---|---|
| 14.5680867 | absolute | It can be interpreted as impact attributable to deprivation | 84.43901 | 69.87092 |
| 0.1725279 | relative | It can be interpreted as fraction attributable to deprivation | 84.43901 | 69.87092 |
Additional existing healthiar functions
attribute_health() with the argument approach_multiexposure for analysis with correlated exposuresattribute_lifetable() for YLLget_daly as the sum of YLL and YLDattribute_mod() modify an existing healthiar assessmentcba() cost-benefit analysisget_mdi() creates the BEST-COST Multidimensional Deprivation Index (MDI)summarize_uncertainty() with Monte Carlo simulationNote
Coming (relatively) soon: function examples in the intro vignette!
healthiar workflowVisualization is out of scope of healthiar. You can visualize in
ggplot2 package (online book by the creator)1.1. How many lung cancer cases are attributable to PM2.5 exposure in a single year?
- With a **cutoff of 5** microgram/m\^3 ?
- With **no cutoff**?
1.2. What’s the direct treatment costs per year in the no cutoff scenario?
Data
Solutions in the Appendix
2.1. How many people are highly annoyed due to noise exposure
Advanced
2.2. How large is the health burden measured in YLD?
attribute_health to see which extra argument must be used to express the burden in YLDData
healthiar example data set exdat_noise_haexposure_mean90-3.1162*c+0.0342*c^20.02Solutions in the Appendix
Tip
Once healthiar is installed and loaded you can use the example data directly or add it to your Environment in RStudio by assigning it to a variable, e.g. data <- exdat_noise_ha
Remember: for the workshop next week, please install & test healthiar, because you will have to programm in RStudio!
If you encounter challenges during installation, get in touch with us : )
Thank you for your attention, and happy coding! : )
The title summarizes the function of the function (hehe) in one sentence.
The title shows up next to the function name in the package landing page.
Important
Any arguments that appear without a = symbol after them in the usage section have to be user-specified in all function call.
Note
The inputs to the arguments in the usage section are default inputs
numeric vs. string) & input options (if available) are specified.Warning
Depending on the function, this section might be not very developed at the moment. Sometimes more function details are found in the intro vignette.
Tip
By clicking on Run examples the example(s) are executed and the output shown
Run examples (see previous slide)Solution Exercise 1
How many COPD cases are attributable to PM2.5 exposure in a single year?
results_with_cutoff <- attribute_health(
erf_shape = "linear",
rr_central = 1.06,
rr_increment = 10, # μg / m^3
exp_central = 10, # μg / m^3
cutoff_central = 5, # μg / m^3
bhd_central = 100000 # baseline health data: COPD incidence
)
# Attributable impact
print(results_with_cutoff$health_main$impact_rounded)[1] 2913
Solution Exercise 2
results_noise_ha <- attribute_health(
approach_risk = "absolute_risk",
exp_central = exdat_noise_ha$exposure_mean,
pop_exp = exdat_noise_ha$population_exposed_total,
erf_eq_central = "90-3.1162*c+0.0342*c^2"
)
# Total attributable cases of high annoyance
print(results_noise_ha$health_main$impact_rounded)[1] 278894
results_noise_ha <- attribute_health(
approach_risk = "absolute_risk",
exp_central = exdat_noise_ha$exposure_mean,
pop_exp = exdat_noise_ha$population_exposed_total,
erf_eq_central = "90-3.1162*c+0.0342*c^2"
)
# Attributable cases of high annoyance from exposure in highest noise band
print(results_noise_ha$health_detailed$impact_raw$impact[5])[1] 4150.935
results_noise_ha_yld <- attribute_health(
approach_risk = "absolute_risk",
exp_central = exdat_noise_ha$exposure_mean,
pop_exp = exdat_noise_ha$population_exposed_total,
erf_eq_central = "90-3.1162*c+0.0342*c^2",
dw_central = 0.02,
duration_central = 1
)
# Attributable number of YLD due to high annoyance
print(results_noise_ha_yld$health_main$impact_rounded)[1] 5578
socialize()socialize()featuresStand-alone use the function can either be used with an
attribute_...output or an external health impactUse of deprivation indicator BEST-COST Multidimensional Deprivation Index (MDI) or any other (single) deprivation indicator, e.g. house hold income
First create an
attribute_health()using example data setexdat_mdiThen run
socialize()